Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Benchmarking of methods for genomic taxonomy.

Identifieur interne : 001D31 ( Main/Exploration ); précédent : 001D30; suivant : 001D32

Benchmarking of methods for genomic taxonomy.

Auteurs : Mette V. Larsen [Danemark] ; Salvatore Cosentino ; Oksana Lukjancenko ; Dhany Saputra ; Simon Rasmussen ; Henrik Hasman ; Thomas Sicheritz-Pontén ; Frank M. Aarestrup ; David W. Ussery ; Ole Lund

Source :

RBID : pubmed:24574292

Descripteurs français

English descriptors

Abstract

One of the first issues that emerges when a prokaryotic organism of interest is encountered is the question of what it is--that is, which species it is. The 16S rRNA gene formed the basis of the first method for sequence-based taxonomy and has had a tremendous impact on the field of microbiology. Nevertheless, the method has been found to have a number of shortcomings. In the current study, we trained and benchmarked five methods for whole-genome sequence-based prokaryotic species identification on a common data set of complete genomes: (i) SpeciesFinder, which is based on the complete 16S rRNA gene; (ii) Reads2Type that searches for species-specific 50-mers in either the 16S rRNA gene or the gyrB gene (for the Enterobacteraceae family); (iii) the ribosomal multilocus sequence typing (rMLST) method that samples up to 53 ribosomal genes; (iv) TaxonomyFinder, which is based on species-specific functional protein domain profiles; and finally (v) KmerFinder, which examines the number of cooccurring k-mers (substrings of k nucleotides in DNA sequence data). The performances of the methods were subsequently evaluated on three data sets of short sequence reads or draft genomes from public databases. In total, the evaluation sets constituted sequence data from more than 11,000 isolates covering 159 genera and 243 species. Our results indicate that methods that sample only chromosomal, core genes have difficulties in distinguishing closely related species which only recently diverged. The KmerFinder method had the overall highest accuracy and correctly identified from 93% to 97% of the isolates in the evaluations sets.

DOI: 10.1128/JCM.02981-13
PubMed: 24574292


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Benchmarking of methods for genomic taxonomy.</title>
<author>
<name sortKey="Larsen, Mette V" sort="Larsen, Mette V" uniqKey="Larsen M" first="Mette V" last="Larsen">Mette V. Larsen</name>
<affiliation wicri:level="1">
<nlm:affiliation>Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Kongens Lyngby, Denmark.</nlm:affiliation>
<country xml:lang="fr">Danemark</country>
<wicri:regionArea>Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Kongens Lyngby</wicri:regionArea>
<wicri:noRegion>Kongens Lyngby</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Cosentino, Salvatore" sort="Cosentino, Salvatore" uniqKey="Cosentino S" first="Salvatore" last="Cosentino">Salvatore Cosentino</name>
</author>
<author>
<name sortKey="Lukjancenko, Oksana" sort="Lukjancenko, Oksana" uniqKey="Lukjancenko O" first="Oksana" last="Lukjancenko">Oksana Lukjancenko</name>
</author>
<author>
<name sortKey="Saputra, Dhany" sort="Saputra, Dhany" uniqKey="Saputra D" first="Dhany" last="Saputra">Dhany Saputra</name>
</author>
<author>
<name sortKey="Rasmussen, Simon" sort="Rasmussen, Simon" uniqKey="Rasmussen S" first="Simon" last="Rasmussen">Simon Rasmussen</name>
</author>
<author>
<name sortKey="Hasman, Henrik" sort="Hasman, Henrik" uniqKey="Hasman H" first="Henrik" last="Hasman">Henrik Hasman</name>
</author>
<author>
<name sortKey="Sicheritz Ponten, Thomas" sort="Sicheritz Ponten, Thomas" uniqKey="Sicheritz Ponten T" first="Thomas" last="Sicheritz-Pontén">Thomas Sicheritz-Pontén</name>
</author>
<author>
<name sortKey="Aarestrup, Frank M" sort="Aarestrup, Frank M" uniqKey="Aarestrup F" first="Frank M" last="Aarestrup">Frank M. Aarestrup</name>
</author>
<author>
<name sortKey="Ussery, David W" sort="Ussery, David W" uniqKey="Ussery D" first="David W" last="Ussery">David W. Ussery</name>
</author>
<author>
<name sortKey="Lund, Ole" sort="Lund, Ole" uniqKey="Lund O" first="Ole" last="Lund">Ole Lund</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2014">2014</date>
<idno type="RBID">pubmed:24574292</idno>
<idno type="pmid">24574292</idno>
<idno type="doi">10.1128/JCM.02981-13</idno>
<idno type="wicri:Area/PubMed/Corpus">001A43</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001A43</idno>
<idno type="wicri:Area/PubMed/Curation">001A43</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">001A43</idno>
<idno type="wicri:Area/PubMed/Checkpoint">001982</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">001982</idno>
<idno type="wicri:Area/Ncbi/Merge">000D01</idno>
<idno type="wicri:Area/Ncbi/Curation">000D01</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">000D01</idno>
<idno type="wicri:Area/Main/Merge">001D46</idno>
<idno type="wicri:Area/Main/Curation">001D31</idno>
<idno type="wicri:Area/Main/Exploration">001D31</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Benchmarking of methods for genomic taxonomy.</title>
<author>
<name sortKey="Larsen, Mette V" sort="Larsen, Mette V" uniqKey="Larsen M" first="Mette V" last="Larsen">Mette V. Larsen</name>
<affiliation wicri:level="1">
<nlm:affiliation>Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Kongens Lyngby, Denmark.</nlm:affiliation>
<country xml:lang="fr">Danemark</country>
<wicri:regionArea>Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Kongens Lyngby</wicri:regionArea>
<wicri:noRegion>Kongens Lyngby</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Cosentino, Salvatore" sort="Cosentino, Salvatore" uniqKey="Cosentino S" first="Salvatore" last="Cosentino">Salvatore Cosentino</name>
</author>
<author>
<name sortKey="Lukjancenko, Oksana" sort="Lukjancenko, Oksana" uniqKey="Lukjancenko O" first="Oksana" last="Lukjancenko">Oksana Lukjancenko</name>
</author>
<author>
<name sortKey="Saputra, Dhany" sort="Saputra, Dhany" uniqKey="Saputra D" first="Dhany" last="Saputra">Dhany Saputra</name>
</author>
<author>
<name sortKey="Rasmussen, Simon" sort="Rasmussen, Simon" uniqKey="Rasmussen S" first="Simon" last="Rasmussen">Simon Rasmussen</name>
</author>
<author>
<name sortKey="Hasman, Henrik" sort="Hasman, Henrik" uniqKey="Hasman H" first="Henrik" last="Hasman">Henrik Hasman</name>
</author>
<author>
<name sortKey="Sicheritz Ponten, Thomas" sort="Sicheritz Ponten, Thomas" uniqKey="Sicheritz Ponten T" first="Thomas" last="Sicheritz-Pontén">Thomas Sicheritz-Pontén</name>
</author>
<author>
<name sortKey="Aarestrup, Frank M" sort="Aarestrup, Frank M" uniqKey="Aarestrup F" first="Frank M" last="Aarestrup">Frank M. Aarestrup</name>
</author>
<author>
<name sortKey="Ussery, David W" sort="Ussery, David W" uniqKey="Ussery D" first="David W" last="Ussery">David W. Ussery</name>
</author>
<author>
<name sortKey="Lund, Ole" sort="Lund, Ole" uniqKey="Lund O" first="Ole" last="Lund">Ole Lund</name>
</author>
</analytic>
<series>
<title level="j">Journal of clinical microbiology</title>
<idno type="eISSN">1098-660X</idno>
<imprint>
<date when="2014" type="published">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Archaea (genetics)</term>
<term>Bacteria (genetics)</term>
<term>Bacterial Proteins (genetics)</term>
<term>Benchmarking (methods)</term>
<term>Classification (methods)</term>
<term>DNA, Bacterial (genetics)</term>
<term>Genomics (methods)</term>
<term>Multilocus Sequence Typing (methods)</term>
<term>RNA, Ribosomal, 16S (genetics)</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>ADN bactérien (génétique)</term>
<term>ARN ribosomique 16S (génétique)</term>
<term>Archéobactéries (génétique)</term>
<term>Bactéries (génétique)</term>
<term>Classification ()</term>
<term>Génomique ()</term>
<term>Protéines bactériennes (génétique)</term>
<term>Référenciation ()</term>
<term>Typage par séquençage multilocus ()</term>
</keywords>
<keywords scheme="MESH" type="chemical" qualifier="genetics" xml:lang="en">
<term>Bacterial Proteins</term>
<term>DNA, Bacterial</term>
<term>RNA, Ribosomal, 16S</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en">
<term>Archaea</term>
<term>Bacteria</term>
</keywords>
<keywords scheme="MESH" qualifier="génétique" xml:lang="fr">
<term>ADN bactérien</term>
<term>ARN ribosomique 16S</term>
<term>Archéobactéries</term>
<term>Bactéries</term>
<term>Protéines bactériennes</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Benchmarking</term>
<term>Classification</term>
<term>Genomics</term>
<term>Multilocus Sequence Typing</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Classification</term>
<term>Génomique</term>
<term>Référenciation</term>
<term>Typage par séquençage multilocus</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">One of the first issues that emerges when a prokaryotic organism of interest is encountered is the question of what it is--that is, which species it is. The 16S rRNA gene formed the basis of the first method for sequence-based taxonomy and has had a tremendous impact on the field of microbiology. Nevertheless, the method has been found to have a number of shortcomings. In the current study, we trained and benchmarked five methods for whole-genome sequence-based prokaryotic species identification on a common data set of complete genomes: (i) SpeciesFinder, which is based on the complete 16S rRNA gene; (ii) Reads2Type that searches for species-specific 50-mers in either the 16S rRNA gene or the gyrB gene (for the Enterobacteraceae family); (iii) the ribosomal multilocus sequence typing (rMLST) method that samples up to 53 ribosomal genes; (iv) TaxonomyFinder, which is based on species-specific functional protein domain profiles; and finally (v) KmerFinder, which examines the number of cooccurring k-mers (substrings of k nucleotides in DNA sequence data). The performances of the methods were subsequently evaluated on three data sets of short sequence reads or draft genomes from public databases. In total, the evaluation sets constituted sequence data from more than 11,000 isolates covering 159 genera and 243 species. Our results indicate that methods that sample only chromosomal, core genes have difficulties in distinguishing closely related species which only recently diverged. The KmerFinder method had the overall highest accuracy and correctly identified from 93% to 97% of the isolates in the evaluations sets.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Danemark</li>
</country>
</list>
<tree>
<noCountry>
<name sortKey="Aarestrup, Frank M" sort="Aarestrup, Frank M" uniqKey="Aarestrup F" first="Frank M" last="Aarestrup">Frank M. Aarestrup</name>
<name sortKey="Cosentino, Salvatore" sort="Cosentino, Salvatore" uniqKey="Cosentino S" first="Salvatore" last="Cosentino">Salvatore Cosentino</name>
<name sortKey="Hasman, Henrik" sort="Hasman, Henrik" uniqKey="Hasman H" first="Henrik" last="Hasman">Henrik Hasman</name>
<name sortKey="Lukjancenko, Oksana" sort="Lukjancenko, Oksana" uniqKey="Lukjancenko O" first="Oksana" last="Lukjancenko">Oksana Lukjancenko</name>
<name sortKey="Lund, Ole" sort="Lund, Ole" uniqKey="Lund O" first="Ole" last="Lund">Ole Lund</name>
<name sortKey="Rasmussen, Simon" sort="Rasmussen, Simon" uniqKey="Rasmussen S" first="Simon" last="Rasmussen">Simon Rasmussen</name>
<name sortKey="Saputra, Dhany" sort="Saputra, Dhany" uniqKey="Saputra D" first="Dhany" last="Saputra">Dhany Saputra</name>
<name sortKey="Sicheritz Ponten, Thomas" sort="Sicheritz Ponten, Thomas" uniqKey="Sicheritz Ponten T" first="Thomas" last="Sicheritz-Pontén">Thomas Sicheritz-Pontén</name>
<name sortKey="Ussery, David W" sort="Ussery, David W" uniqKey="Ussery D" first="David W" last="Ussery">David W. Ussery</name>
</noCountry>
<country name="Danemark">
<noRegion>
<name sortKey="Larsen, Mette V" sort="Larsen, Mette V" uniqKey="Larsen M" first="Mette V" last="Larsen">Mette V. Larsen</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001D31 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001D31 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     pubmed:24574292
   |texte=   Benchmarking of methods for genomic taxonomy.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:24574292" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021